Word Lattice Reranking for Chinese Word Segmentation and Part-of-Speech Tagging

نویسندگان

  • Wenbin Jiang
  • Haitao Mi
  • Qun Liu
چکیده

In this paper, we describe a new reranking strategy named word lattice reranking, for the task of joint Chinese word segmentation and part-of-speech (POS) tagging. As a derivation of the forest reranking for parsing (Huang, 2008), this strategy reranks on the pruned word lattice, which potentially contains much more candidates while using less storage, compared with the traditional n-best list reranking. With a perceptron classifier trained with local features as the baseline, word lattice reranking performs reranking with non-local features that can’t be easily incorporated into the perceptron baseline. Experimental results show that, this strategy achieves improvement on both segmentation and POS tagging, above the perceptron baseline and the n-best list reranking.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Part-of-Speech Reranking to Improve Chinese Word Segmentation

Chinese word segmentation and Part-ofSpeech (POS) tagging have been commonly considered as two separated tasks. In this paper, we present a system that performs Chinese word segmentation and POS tagging simultaneously. We train a segmenter and a tagger model separately based on linear-chain Conditional Random Fields (CRF), using lexical, morphological and semantic features. We propose an approx...

متن کامل

Chinese Part-of-Speech Tagging: One-at-a-Time or All-at-Once? Word-Based or Character-Based?

Chinese part-of-speech (POS) tagging assigns one POS tag to each word in a Chinese sentence. However, since words are not demarcated in a Chinese sentence, Chinese POS tagging requires word segmentation as a prerequisite. We could perform Chinese POS tagging strictly after word segmentation (one-at-a-time approach), or perform both word segmentation and POS tagging in a combined, single step si...

متن کامل

An Enhanced Model for Chinese Word Segmentation and Part-of-Speech Tagging

This paper will present an enhanced probabilistic model for Chinese word segmentation and part-of-speech (POS) tagging. The model introduces the information of Chinese word length as one of its features to reach a more accurate result. And in addition, the model also achieves the integration of segmentation and POS tagging. After presenting the model, this paper will give a brief discussion on ...

متن کامل

Mandarin Part-of-Speech Tagging and Discriminative Reranking

We present in this paper methods to improve HMM-based part-of-speech (POS) tagging of Mandarin. We model the emission probability of an unknown word using all the characters in the word, and enrich the standard left-to-right trigram estimation of word emission probabilities with a right-to-left prediction of the word by making use of the current and next tags. In addition, we utilize the RankBo...

متن کامل

Reduce Meaningless Words for Joint Chinese Word Segmentation and Part-of-speech Tagging

Conventional statistics-based methods for joint Chinese word segmentation and partof-speech tagging (S&T) have generalization ability to recognize new words that do not appear in the training data. An undesirable side effect is that a number of meaningless words will be incorrectly created. We propose an effective and efficient framework for S&T that introduces features to significantly reduce ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008